Inventi Impact: Audio, Speech & Music Processing

Articles

Inventi:easm/31903/20

Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro Fuzzy System

Research 2020 : July - September

Gwenaelle Cunha Sergio, Minho Lee

Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and\nautomatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material\neliciting stronger emotions in them. Moreover, there is currently a demand for more empathetic computers to aid humans in\napplications such as augmenting the perception ability of visually- and/or hearing-impaired people. Current approaches overlook\nthe videoâ??s emotional characteristics in the music generation step, only consider static images instead of videos, are unable to\ngenerate novel music, and require a high level of human effort and skills. In this study, we propose a novel hybrid deep neural\nnetwork that uses an Adaptive Neuro-Fuzzy Inference System to predict a videoâ??s emotion from its visual features and a deep Long\nShort-Term Memory Recurrent Neural Network to generate its corresponding audio signals with similar emotional inkling. The\nformer is able to appropriately model emotions due to its fuzzy properties, and the latter is able to model data with dynamic time\nproperties well due to the availability of the previous hidden state information. The novelty of our proposed method lies in the\nextraction of visual emotional features in order to transform them into audio signals with corresponding emotional aspects for\nusers. Quantitative experiments show low mean absolute errors of 0.217 and 0.255 in the Lindsey and DEAP datasets, respectively,\nand similar global features in the spectrograms. This indicates that our model is able to appropriately perform domain\ntransformation between visual and audio features. Based on experimental results, our model can effectively generate an audio that\nmatches the scene eliciting a similar emotion from the viewer in both datasets, and music generated by our model is also chosen\nmore often (code available online at https://github.com/gcunhase/Emotional-Video-to-Audio-with-ANFIS-DeepRNN).

How to Cite this Article
Attribution/ CC Compliant Citation: Cunha Sergio, Gwenaelle, and Minho Lee. \"Emotional Video to Audio Transformation Using Deep\nRecurrent Neural Networks and a Neuro-Fuzzy System.\" Mathematical Problems in Engineering 2020 (2020).\nhttps://doi.org/10.1155/2020/8478527\nhttp://creativecommons.org/licenses/by/4.0/\n\nSome formatting elements, header, footer, logos, dates and pagination were modified while adapting this article.
Download Full Text

Call Us: +4 (800) 888-0008

Inventi Impact: Audio, Speech & Music Processing

Articles

Inventi:easm/31903/20

Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro Fuzzy System

How to Cite this Article

Links

Contact Us